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Abstract 

The problem of reducing the bias of maximum likelihood estimator in a general multivariate elliptical regression 
model is considered. The model is very flexible and allows the mean vector and the dispersion matrix to have parame¬ 
ters in common. Many frequently used models are special cases of this general formulation, namely: errors-in-variables 
models, nonlinear mixed-effects models, heteroscedastic nonlinear models, among others. In any of these models, the 
vector of the errors may have any multivariate elliptical distribution. We obtain the second-order bias of the maximum 
likelihood estimator, a bias-corrected estimator, and a bias-reduced estimator. Simulation results indicate the effective¬ 
ness of the bias correction and bias reduction schemes. 

Keywords: Bias correction; bias reduction; elliptical model; maximum likelihood estimation; general parameteri¬ 
zation. 


1 Introduction 

It is well known that, under some standard regularity conditions, maximum-likelihood estimators (MLEs) are consistent 
and asymptotically normally distributed. Hence, their biases converge to zero when the sample size increases. However, 
for finite sample sizes, the MLEs are in general biased and bias correction plays an important role in the point estimation 
theory. 

A general expression for the term of order 0{n~^) in the expansion of the bias of MLEs was given by Cox and 
Snell (1968). This term is often called second-order bias and can be useful in actual problems. Eor instance, a very high 
second-order bias indicates that other than maximum-likelihood estimation procedures should be used. Also, corrected 
estimators can be formulated by subtracting the estimated second-order biases from the respective MLEs. It is expected 
that these corrected estimators have smaller biases than the uncorrected ones, especially in small samples. 
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Cox and Snell’s formulae for second-order biases of MLEs were applied in many models. Cordeiro and McCullagh 
(1991) use these formulae in generalized linear models; Cordeiro and Klein (1994) compute them for ARMA models; 
Cordeiro et al. (2000) apply them for symmetric nonlinear regression models; Vasconcellos and Cordeiro (2000) obtain 
them for multivariate nonlinear Student t regression models. More recently, Cysneiros et al. (2010) study the univariate 
heteroscedastic symmetric nonlinear regression models (which are an extension of Cordeiro et al. 2000) and Patriota 
and Lemonte (2009) obtain a general matrix formula for the bias correction in a multivariate normal model where the 
mean and the covariance matrix have parameters in common. 

An alternative approach to bias correction was suggested by Firth (1993). The idea is to adjust the estimating function 
so that the estimate becomes less biased. This approach can be viewed as a “preventive” method, since it modifies the 
original score function, prior to obtaining the parameter estimates. In this paper, estimates obtained from Cox and 
Snell’s approach and Firth’s method will be called bias-corrected estimates and bias-reduced estimates, respectively. 
Firth showed that in generalized linear models with canonical link function the preventive method is equivalent to 
maximizing a penalized likelihood that is easily implemented via an iterative adjustment of the data. The bias reduction 
proposed by Firth has received considerable attention in the statistical literature. For models for binary data, see Mehrabi 
and Matthews (1995); for censored data with exponential lifetimes, see Pettitt et al. (1998). In Bull et al. (2002) bias 
reduction is obtained for the multinomial logistic regression model. In Kosmidis and Firth (2009) a family of bias- 
reducing adjustments was developed for a general class of univariate and multivariate generalized nonlinear models. The 
bias reduction in cumulative link models for ordinal data was studied in Kosmidis (2014). Additionally, Kosmidis and 
Firth (2011) showed how to obtain the bias-reducing penalized maximum likelihood estimator by using the equivalent 
Poisson log-linear model for the parameters of a multinomial logistic regression. 

It is well-known and was noted by Firth (1993) and Kosmidis and Firth (2009) that the reduction in bias may 
sometimes be accompanied by inflation of variance, possibly yielding an estimator whose mean squared error is bigger 
than that of the original one. Nevertheless, published empirical studies such as those mentioned above show that, in some 
frequently used models, bias-reduced and bias-corrected estimators can perform better than the unadjusted maximum 
likelihood estimators, especially when the sample size is small. 

Our goal in this paper is to obtain bias correction and bias reduction to the maximum likelihood estimators for 
the general multivariate elliptical model. We extend the work of Patriota and Lemonte (2009) to the elliptical class of 
distributions defined in Lemonte and Patriota (2011). We focus on analytical methods only, because simulations for a 
general multivariate normal model suggests that analytical bias corrections outperforms the computationally intensive 
bootstrap methods (Lemonte, 2011). 

In order to illustrate the ampleness of the general multivariate elliptical model, we mention some of its submodels: 
multiple linear regression, heteroscedastic multivariate nonlinear regressions, nonlinear mixed-effects models (Patriota 
2011), heteroscedastic errors-in-variables models (Patriota et al. 2009a,b), structural equation models, multivariate 
normal regression model with general parametrization (Lemonte, 2011), simultaneous equation models and mixtures of 
them. It is important to note that the usual normality assumption of the error is relaxed and replaced by the assumption of 
elliptical errors. The elliptical family of distributions includes many important distributions such as multivariate normal. 
Student t, power exponential, contaminated normal, Pearson II, Pearson VII, and logistic, with heavier or lighter tails 
than the normal distribution; see Fang et al. (1990). 

The paper is organized as follows. Section |2] presents the notation and general results for bias correction and bias 
reduction. Section [^presents the model and our main results, namely the general expression for the second-order bias 
of MLFs, in the general multivariate elliptical model. Section |4] applies our results in four important special cases: 
heteroscedastic nonlinear (linear) model, nonlinear mixed-effects models, multivariate errors-in-variables models and 
log-symmetric regression models. Simulations are presented in Section|5] Applications that use real data are presented 
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and discussed in Section|6] Finally, Section |2]concludes the paper. Technical details are collected in one appendix. 


2 Bias correction and bias reduction 

Let d be the p-vector of unknown parameters and Br its rth element. Also, let U{9)he the score function and Ur {9) = Ur 
its rth element. We use the following tensor notation for the cumulants of the log-likelihood derivatives introduced by 
Lawley (1956): 


^rst — 


= E{UrUs), ^rs,t=E(^^Ut 


and so on. The indices r, s and t vary from 1 to p. The typical (r, s)th element of the Fisher information matrix 
K{9) is Kr,s and we denote by the corresponding element of K(9)~^. All k’s refer to a total over the sample and 
are, in general, of order n. Under standard regular conditions, we have that Urs = —Kr,s, i^rs,t = K^rs — Urst and 

(t) (s) (r) 

Kr,s,t = ‘^Krst — Urs — k)./ — n],/. These identities will be used to facilitate some algebraic operations. 


Let B-g(9) be the second-order bias vector of 9 whose jth element is (9), j = 1,2,..., p. It follows from the 
general expression for the multiparameter second-order biases of MLEs given by Cox and Snell (1968) that 




P 


r,s,t—l 




( 1 ) 


The bias corrected MLE is dehned as 

9bc = 9 — B-g{9). 

The bias-corrected estimator 9bc is expected to have smaller bias than the uncorrected estimator, 9. 

Eirth (1993) proposed an alternative method to partially remove the bias of MLEs. The method replaces the score 
function by its modihed version 

U*{9) = Ui9)-Ki9)Bffi9), 

and a modihed estimate, 9br, is given as a solution to U* (9) = 0. It is noticeable that, unlike Cox and Snell’s approach, 
Eirth’s bias reduction method does not depend on the hniteness of 9. 


3 Model and main results 

We shall follow the same notation presented in Lemonte and Patriota (2011). The elliptical model as dehned in Eang et 
al. (1990) follows. A g x 1 random vector Y has a multivariate elliptical distribution with location parameter p and a 
dehnite positive scale matrix E if its density function is 

fvijj) = |E|-i/^5((y - - p)), (2) 
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Table 1: Generating functions of some multivariate elliptical distributions. 


Distribution 

Generating function g{u) 

normal 

1 -u/2 

(\/2¥)'' 

Cauchy 

£^^-9/2(1 +y)-(l+?)/2 

Student t 

2 j ^ 

power exponential 

£^2-9/(2>)7r-9/2e-“'/2, A > 0 


where g : [0, oo) —>■ (0, oo) is called the density generating function, and it is such that ^g{u)du < oo. We will 

denote Y ^ Elq{g,, S, g) = E). It is possible to show that the characteristic function is = E(exp(i<^y)) = 

exp{it^ Yt), where t GMd and if : [0, oo) —>■ R. Then, if ip is twice differentiable at zero, we have thatE(F) = p 

and Var(y) = ^E, where ^ = ip'lQ). We assume that the density generating function g does not have any unknown 
parameter, which implies that ^ is a known constant. Erom @, when /r = 0 and E = Iq, where Iq is a q x q identity 
matrix, we obtain the spherical family of densities. A comprehensive exposition of the elliptical multivariate class of 
distributions can be found in Fang et al. (1990). Table 1 presents the density generating functions of some multivariate 
elliptical distributions. 

Let Yi, y 2 5 ■•■j be n independent random vectors, where Yi has dimension qt G N, for i = 1,2, The general 
multivariate elliptical model (Lemonte and Patriota 2011) assumes that 

T) fi {p, Xi )Tej, i 1,,., ,Ti, 

with Ci ElqP0,Yii{9,Wi)), where means “independently distributed as”, Xi and Wi are Wi x 1 and x 1 
nonstochastic vectors of auxiliary variables, respectively, associated with the ith observed response Y, which may have 
components in common. Then, 


Yi^^ElqPf^Y,), i = l,...,n, (3) 

where pi = fiiO, Xi) is the location parameter and E^ = Ei(0, Wi) is the definite positive scale matrix. Both fi and E^ 
have known functional forms and are twice differentiable with respect to each element of 0. Additionally, 0 is a p-vector 
of unknown parameters (where p < n and it is fixed). Since 9 must be identifiable in model Q, the functions pi and E^ 
must be defined to accomplish such restriction. 

Several important statistical models are special cases of the general formulation Q, for example, linear and nonlinear 
regression models, homoscedastic or heteroscedastic measurement error models, and mixed-effects models with normal 
errors. It is noteworthy that the normality assumption for the errors may be relaxed and replaced by any distribution 
within the class of elliptical distributions, such as the Student t and the power exponential distributions. The general 
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formulation allows a wide range of different specifications for the location and the scale parameters, coupled with a large 
collection of distributions for the errors. Section 4 presents four important particular cases of the main model Q that 
show the applicability of the general formulation. 

For the sake of simplifying the notation, let Zi = Yi — fj,i and Ui = zj The log-likelihood function associated 
with (O, is given by 

n 

£(e) = J2m, ( 4 ) 

where £i{0) = —4 log |Si| + \ogg{ui). It is assumed that g{-), gt and are such that t{9) is a regular log-likelihood 
function (Cox and Hinkley 1974, Ch. 9) with respect to 9. To obtain the score function and the Fisher information matrix, 
we need to derive (.{9) with respect to the unknown parameters and to compute some moments of such derivatives. We 
assume that such derivatives exist. Thus, we dehne 


and 


_ djM 

“ Q0 1 


^ dYi 

d9sd9r ’ “ d9r ’ 


d9sd9r 


lz(r) 


= 


i{r)^ 


T -1 


for r,s = 1,... ,p. We make use of matrix differentiation methods (Magnus and Neudecker 2007) to compute the 
derivatives of the log-likelihood function. The score vector and the Fisher information matrix for 9 can be shortly 
written as 

U{9) = F^Hs and K{9) = HF, (5) 

respectively, with F = ..., FI = block-diag {iJi,..., i7„}, s = i Sn )^’ H = FlMFl and 

M = block-diag {MJ ,..., M^}, wherein 


Fz = 



E, 0 

0 2Si 0 Si 


ViZi 

-vec(Sj - ViZizJ)\ ’ 


where the “vec” operator transforms a matrix into a vector by stacking the columns of the matrix, Di = (ai(i ),..., ai(^p)), 
Vi = (vec(C'i(i)),..., vec(C'i(p))), Vi = —2Wg{ui) and Wg{u) = dlogg{u)/du. Here, we assume that F has rankp 
(i.e., fii and Si must be dehned to hold such condition). The symbol “0” indicates the Kronecker product. Following 
Lange et al. (1989) we have, for the q-variate Student t distribution with i/ degrees of freedom, S, ly), that Wg(u) = 
— {v + q)/{2(y + u)}. Following Gomez et al. (1998) we have, for the q-variate power exponential PEq{fj,, 6, A) with 
shape parameter A > 0 and u ^ 0, that Wg{u) = —Xu^~^/2, A ^ 1/2. In addition, we have 


M,; = 


4V'i(2,i) ^ 

0 2c,: Si 


0 


+ (ci — 1) 


0 0 

0 vec(Si)vec(Si)''" 


where c, = 4^/>i(2,2)/{9i(gi + 2)}, ^/>i( 2 ,i) = E{W^{r,)ri) and ■!/'i( 2 . 2 ) = 

E{Wg{ri)rf), with Ci = ||Li|p, L, ~ Elq-{0, Iq-). Here, we assume that g{u) is such that ipi{ 2 , 2 ) ex¬ 

ist and are hnite for alH = 1,..., n. One can verify these results by using standard differentiation techniques and some 
standard matrix operations. 


The values of 4’i(i,k) ^e obtained from solving the following one-dimensional integrals (Lange et al. 1989): 

pOO 

V'zpA) = / W/,(s^)'p(s2)r.‘?>+2'=-ic,,ds, (6) 

^0 


5 













where Cq^ = 2 tt^ /r(^) is the surface area of the unit sphere in and r(a) is the well-known gamma function. One 
can hnd these quantities for many distributions simply solving (|6]l algebraically or numerically. Table 2 shows these 
quantities for the normal, Cauchy, Student t and power exponential distributions. 


Table 2: Functions '!/’i( 2 .i)> V'i( 2 , 2 )> ^i( 3 , 2 )> '0i(3,3) for normal, Cauchy, Student t and power exponential (P.E.) distribu- 
tions. 



V'j(2.1) 

Ai2,2) 

A(3,2) 

V'i(3.3) 


normal 

Qi 

4 

qi(qi+2) 

4 

qi(qi+2) 

8 

9i(9i+2)(9i+4) 

8 

72 > 1 

Cauchy 

9t(<?t + l) 

4(qi+3) 

qi(qi+'2){qi+i) 

4(qi+3) 

qi{qi+2)iqi + lf 

8iqi+3){qi+5) 

qi{qi-\-2){qi-\-4){qi + l)^ 

8{qi+3){qi+5) 

72 > 1 

Student t 

qi{qi+v) 

qi{qi+2)(qi+v) 

qi(qi+2}(qi+l^f 

qi(qi+2){qi+4)(qi+i^f 

7i > 1 

4(gi-l-i^+2) 

4iqi+l^+2) 

8{qi+2+l')(qi+4+u) 

8(qi+2+v){qi+4+v) 

P.E. 


2\+l 

4 


(2A+l)(4A-rl) 

8 

= 1, A > i 

P.E. 

2i/^r(ft) 

qi{2\+qi) 

4 

x^rm^+3) 

2i/-^r(ft) 

qi{2X-\-qi){4X-\-qi) 

8 72 

>2, A > 0 


It is important to remark that the involve unknown quantities (for instance, the degrees of freedom 

V of the Student t distribution and the shape parameter A of the power exponential distribution). One may want to 
estimate these quantities via maximum likelihood estimation. Here, we consider these as known quantities for the 
purpose of keeping the robustness property of some distributions. Lucas (1997) shows that the protection against “large” 
observations is only valid when the degrees of freedom parameter is kept hxed for the Student t distribution. Therefore, 
the issue of estimating these quantities is beyond of the main scope of this paper. In practice, one can use model selection 
procedures to choose the most appropriate values of such unknown parameters. 

Notice that, in the Fisher information matrix K{9), the matrix M carries all the information about the adopted 
distribution, while F and H contain the information about the adopted model. Also, K{6) has a quadratic form that can 
be computed through simple matrix operations. Under the normal case, Vi = 1, M = H~^ and hence H = H. 

The Fisher scoring method can be used to estimate 9 by iteratively solving the equation 

(^pirn)Tg{rn)p{rn)-^Q{rn+l) ^ pirn)Tgirn)^*(rn)^ m = 0,l,..., (7) 

where the quantities with the upper index “(m)” are evaluated at 9, m is the iteration counter and 

_ p{m)Q{m) ^{m) ^ 

Each loop, through the iterative scheme (|7]l, consists of an iterative re-weighted least squares algorithm to optimize the 
log-likelihood (|4li. Thus, (|5]) and (|7]l agree with the corresponding equations derived in Patriota and Lemonte (2009). 
Observe that, despite the complexity and generality of the postulated model, expressions (|5]l and 0 are very simple and 
friendly. 
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Now, we can give the main result of the paper. 

Theorem 3.1. The second-order bias vector B-g{0) under model 0 is given by 

( 8 ) 

where ^ , $p)vec((F^i/J^)“^), = (^7(r)> ■ • ■ given in the Appendix. 

Proof: See the Appendix. 


In many models the location vector and the scale matrix do not have parameters in common, i.e., = fii{6i,Xi) 

and Ei = Ei(02, Wi), where 6* = Therefore, = block-diag{^gj, ^@ 2 } and the parameter vectors 9i and 

02 will be orthogonal (Cox and Reid 1987). This happens in mixed models, nonlinear models, among others. However, 
in errors-in-variables and factor analysis models orthogonality does not hold. Model 0 is general enough to encompass 
a large number of models even those that do not have orthogonal parameters. 

Corollary 3.1. When = fj,i{0i,Xi) and = Ei( 6 * 2 , Wi), where 0 = {9j, 0j the second-order bias vector of 0i 
and 02 are given by 

and 

Bs,W = iF;^H2Fe,)-^F;^H2^2, 

respectively. The quantities Fg^, Fg^, Fli, H 2 , and ^2 are defined in the Appendix. 


Proof: See the Appendix. 


Formula ® says that, for any particular model of the general multivariate elliptical class of models (O, it is always 
possible to express the bias of 0 as the solution of an weighted least-squares regression. Also, if Zi ^ Nq.(0, E^) then 
Ci = -Ui = 1, r]u = 0, rj2i = -2, H = H, 


V2(4®a,M)Aj’ 

and formula ([8]l reduces to the one obtained by Patriota and Lemonte (2009). 

Theorem 3.1 implies that all one needs to compute bias-corrected and bias-reduced MLEs in the general elliptical 
model is: (i) the first and second derivatives of the location vector fii and the scale matrix E^ with respect to all the 
parameters; (2) the derivatives Wg(u); (3) some moments involving the chosen elliptical distribution (these moments 
are given in Table 2 for some elliptical distributions). With these quantities, the matrices in ® can be computed and the 
bias vector can be computed through an weighted least-squares regression. 
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4 Special models 


In this section, we present four important particular cases of the main model Q. All special cases presented in Patriota 
and Lemonte (2009) are also special cases of the general multivariate elliptical model defined in this paper. 


4.1 Heteroscedastic nonlinear models 

Consider the univariate heteroscedastic nonlinear model defined by 

Yi = f{xi,a) + Ci, i = l,2,...,n, 

where Yi is the response, Xi is a column vector of explanatory variables, a is a column vector pi x 1 of unknown 
parameters and / is a nonlinear function of a. Assume that ei, 62 ,..., are independent, with ^ El{0, af). Here 
af = erf (7) = h{ujJ 7), where 7 is a p 2 x 1 vector of unknown parameters. Then 

yYZ^ El{f{xi, a), af), 

which is a special case of Q with 6 = (a^, 7 ^)^, fii = f{xi,a) and = af. Here El stands for Eli. Notice that for 
the heteroscedastic linear model f{xi,a) = xja. 

The second-order bias vector B-g{0) comes from (l 8 ]l, which depends on derivatives of f{xi, a) and erf with respect 
to the parameter vector 6. Also, it depends on the quantities ^/’i( 2 .i), V'i( 2 , 2 )i V'i( 3 , 2 )> V'i( 3 , 3 ) (s®® Table 2 ) and Wg{ui) 
containing information about the adopted distribution. 


4.2 Nonlinear mixed-effects model 


One of the most important examples is the nonlinear mixed-effects model introduced by Lange et al. (1989) and studied 
under the assumption of a Student t distribution. Let 

Yi — cr) “t“ Zjhi Uij 


where Yi is the qi x 1 vector response, pi is a -dimensional nonlinear function of a, Xi is a vector of nonstochastic 
covariates, Zi is a matrix of known constants, a is a pi x 1 vector of unknown parameters and is an r x 1 vector of 
unobserved random regression coefficients. Assume that. 


h 

Ui 


- El 


r+qi 


0 

0 


0 ]\ 

0 R^h2)\J^ 


where 71 is a p 2 -dimensional vector of unknown parameters and 72 is a pa x 1 vector of unknown parameters. Fur¬ 
thermore, the vectors (&i, rti)^, ( 62 , ^ 2 )^, ■ • ■, (bn, are independent. Therefore, the marginal distribution of the 
observed vector is 


Yi ^ Elq. {fj.i{xi,a)-,Y,i{Zi,j)) , (9) 

where 7 = ™d Y,i{Zi,j) = ZiYi,{'^i)Zj + i?i( 72 ). Equation @ is a special case of (O with 6 = 

(a^, 7 ^)^, Hi = 01 ) and = Ei(Zi, 7 ). From (| 8 ]l one can compute the bias vector B-g(6). 
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4.3 Errors-in-variables model 


Consider the model 


xii = j5o + l3iX2i-\-qi, i = l,...,n, 

where xu is a r; x 1 latent response vector, X 2 i is a m x 1 latent vector of covariates, /3o is a r; x 1 vector of intercepts, /3i 
is a t; X m matrix of slopes, and qi is the equation error having a multivariate elliptical distribution with location vector 
zero and scale matrix The variables xu and X 2 i are not directly observed, instead surrogate variables Xu and X 2 i 
are measured with the following additive structure: 


^li — X\i “t“ and X2i — X2i “t“ ^X2i- 


( 10 ) 


The random quantities X 2 i, qi, Sxn and Sx^i are assumed to follow an elliptical distribution given by 


/ X2t\ 



^ 9‘X2^ 


^^X2 

0 

0 



qi 

ind jpj 

ll/L2v-\-2m 


0 


0 


0 

0 


^xii 


0 

1 

0 

0 

Tx^i 

0 





^ 0 ^ 


^ 0 

0 

0 

TX 2 J 



where the matrices Txi and tu are known for alH = 1,... ,n. These “known” matrices may be attained, for example, 
through an analytical treatment of the data collection mechanism, replications, machine precision, etc (Kulathinal et al. 
( 2002 )). 

Therefore, the observed vector Yi = {Xu, has marginal distribution given by 




( 11 ) 


with 


fi{e) = 


f 1^0 + 

V J 


and Ei( 0 ) 


/ Pl'^X2Pj + Eg + Txx 

\ ^X2PJ 


Pl^X2 \ 

^X2 T Tx2i J 


where 9 = vec(/3i)^, vech(Ea; 2 )^, vech(Eg)^)"'", “vech” operator transforms a symmetric matrix into a vec¬ 

tor by stacking into columns its diagonal and superior diagonal elements. The mean vector (9) and the covariance- 
variance matrix Si(9) of observed variables have the matrix /3i in common, i.e., they share mv parameters. Kulathinal 
et al. ( 2002 ) study the linear univariate case {v = 1, m = 1 ). 


Equation (fTTI) is a special case of (O with qi = v + m, 9 = (a^, 7 ''")"'', fii = iJ,i{9) and E^ = Ei(0). In this case, a 
programming language or software that can perform operations on vectors and matrices, e.g. Ox (Doornik, 2013) and R 
(Ihaka and Gentleman, 1996), can be used to obtain the bias vector Bg{9) from ®. 


4.4 Log-symmetric regression models 

Let T be a continuous positive random variable with probability density function 


fT{t-,ri,(j),g) 




9 log 



, r] > 0, (j) > 0, 


( 12 ) 
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where g is the density generating function of a univariate elliptical distribution, and we write T ^ LS{ri, cj), g). Vanegas 
and Paula (2014) called the class of distribution in (fT^ the log-symmetric class of distributions. It includes log-normal, 
log-Student t, log-power-exponential distributions, among many others, as special cases. It is easy to verify that log(T) 
has a univariate elliptical distribution (i.e., symmetric distribution) with location parameter /i = log(r 7 ) and scale param¬ 
eter (j). The parameter 77 is the median of T, and (p can be interpreted as a skewness or relative dispersion parameter. 

Vanegas and Paula (2015) dehned and studied semi-parametric regression models for a set Ti,T 2 ,.. ■ ,Tn with 
Ti ^ LS{r]i, (j)i,g) with rji > 0 and pi > 0 following semi-parametric regression structures. Here we assume parametric 
specihcation for rji and pi as rji = rii{xi, a) and pi = pi{uji, 7). 

Hence, 


Yi = log{TP ^ El{iJ,t{xi,a),pi{uJi,^)), (13) 

where g,i{xi, a) = log( 77 (xi, a)). Therefore, ( fTSl l is a special case of the general elliptical model Q, and formula ([ 8 ]l 
applies. 


5 Simulation results 

In this section, we shall present the results of Monte Carlo simulation experiments in which we evaluate the hnite sample 
performances of the original MLEs and their bias-corrected and bias-reduced versions. The simulations are based on the 
univariate nonlinear model without random effects (Section 4.1) and the errors-in-variables model presented in Section 
4.2, when Yi follows a normal distribution, a Student t distribution with v degrees of freedom, or a power exponential 
distribution with shape parameter A. For all the simulations, the number of Monte Carlo replications is 10,000 (ten 
thousand) and they have been performed using the Ox matrix programming language (Doornik, 2013). 

First consider the model described in (|9]l with qi = 1, Zi = 0, Yi = and 

Ql 2 

g.i{a) = ^ii{x^,a) = ai +— - i = (14) 

1 -I- asx^ * 

Here the unknown parameter vector is 0 = (ai, 02, cng, a4, . The values of cci were obtained as random draws from 

the uniform distribution U (0,100). The sample sizes considered are n = 10, 20, 30,40 and 50. The parameter values 
are ai = 50, 02 = 500, as = 0.50, a 4 = 2 and af — 200. For the Student t distribution, we fixed the degrees of 
freedom at = 4, and for the power exponential model the shape parameter is fixed at A = 0.8. 

Tables 3-4 present the bias, and the root mean squared errors (s/MSE) of the maximum likelihood estimates, the 
bias-corrected estimates and the bias-reduced estimates for the nonlinear model with normal and Student t distributed 
errors, respectively. To save space, the corresponding results for the power exponential model are not shown0 We note 
that the bias-corrected estimates and the bias-reduced estimates are less biased than the original MLE for all the sample 
sizes considered. For instance, when n = 20 and the errors follow a Student t distribution (see Table 4) the estimated 
biases of are —41.24 (MLE), —12.30 (bias-corrected) and —4.55 (bias-reduced). For the normal case with n = 10 
(see Table 3), the estimated biases of Q!2 are 2.16 (MLE), 0.70 (bias-corrected) and —0.27 (bias-reduced). We also 
observe that the bias-reduced estimates are less biased than the bias-corrected estimates in most cases. As n increases, 
the bias and the root mean squared error of all the estimators decrease, as expected. Additionally, we note that the MLE 

' All the omitted tables in this paper are presented in a supplement available from the authors upon request. 
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of a 2 has ^/MSE larger than those of the modified versions. For the estimation of cr^, \/MSE is smaller for the original 
MLE. In other cases, we note that the estimators have similar root mean squared errors. 


Table 3: Biases and \/MSE of the maximum likelihood estimate and its adjusted versions; nonlinear model; normal 
distribution. 




MLE 

Bias-corrected MLE 

Bias-reduced MLE 

n 

e 

Bias 


Bias 

y/W^ 

Bias 

y/M^ 


ai 

-0.29 

6.69 

-0.13 

6.67 

-0.01 

6.67 


<32 

2.16 

20.07 

0.70 

19.40 

-0.27 

19.06 

10 

(33 

0.01 

0.13 

0.00 

0.12 

0.00 

0.12 


0^4 

0.03 

0.30 

0.01 

0.29 

-0.00 

0.29 


<72 

-80.05 

106.44 

-32.06 

103.32 

9.09 

128.72 


<31 

-0.08 

4.07 

-0.01 

4.07 

0.01 

4.07 


<32 

0.66 

17.94 

-0.08 

17.84 

-0.27 

17.82 

20 

<33 

0.00 

0.09 

0.00 

0.09 

-0.00 

0.09 


0^4 

0.02 

0.21 

0.01 

0.20 

0.00 

0.20 


<72 

-40.07 

69.73 

-8.09 

68.95 

0.86 

72.02 


<3i 

-0.10 

3.11 

-0.04 

3.10 

-0.02 

3.10 


02 

0.71 

17.24 

-0.05 

17.15 

-0.18 

17.13 

30 

<33 

0.00 

0.09 

-0.00 

0.09 

-0.00 

0.09 


0^4 

0.02 

0.20 

0.00 

0.19 

0.00 

0.19 


<72 

-26.41 

55.26 

-3.26 

55.11 

0.82 

56.32 


<3l 

-0.08 

2.69 

-0.02 

2.69 

-0.01 

2.69 


02 

0.83 

16.80 

0.09 

16.70 

0.01 

16.69 

40 

O 3 

0.00 

0.09 

0.00 

0.09 

0.00 

0.09 


0^4 

0.02 

0.19 

0.00 

0.18 

-0.00 

0.18 


<72 

-20.04 

47.26 

-2.04 

47.13 

0.33 

47.74 


Ol 

-0.08 

2.39 

-0.03 

2.38 

-0.02 

2.38 


02 

1.07 

14.25 

0.30 

14.12 

0.23 

14.11 

50 

03 

0.00 

0.08 

0.00 

0.08 

0.00 

0.08 


a4 

0.01 

0.19 

0.00 

0.18 

-0.00 

0.18 


<72 

-15.93 

41.41 

-1.21 

41.30 

0.36 

41.67 


We now consider the errors-in-variables model described in (fTOl) . The sample sizes considered are n = 15, 25,35 and 
50. The parameter values are /3o = 0.70 l„xi^ /3i = 0.40 l„xm, = 70 lmxi> = 40 7i,xi and = 250 Imxi- 
Here, l^xs is as r x s matrix of ones and I^xsi^ the identity matrix with dimension rxs. For the Student t distribution, 
we fixed the degrees of freedom at z/ = 4 and, for power exponential model, the shape parameter was fixed at A = 0.7. 
We consider u G {1, 2} and m = 1. 

In Tables 5-6, we present the MLE, the bias-corrected estimates, the bias-reduced estimates, and corresponding 
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Table 4: Biases and VMSE of the maximum likelihood estimate and its adjusted versions; nonlinear model; Student t 
distribution. 




MLE 

Bias-corrected MLE 

Bias-reduced MLE 

n 

e 

Bias 


Bias 


Bias 



ai 

-0.51 

8.66 

-0.31 

8.63 

-0.20 

8.56 


0(2 

3.34 

28.47 

1.39 

27.34 

2.05 

27.67 

10 

0(3 

0.01 

0.17 

0.00 

0.16 

0.01 

0.16 



0.06 

0.42 

0.03 

0.39 

-0.01 

0.38 



-93.18 

127.60 

-54.24 

130.73 

-17.40 

170.35 


0(1 

-0.17 

5.03 

-0.07 

5.02 

-0.04 

5.01 


a2 

2.01 

25.64 

0.91 

25.11 

1.29 

24.98 

20 

0(3 

0.01 

0.14 

0.01 

0.14 

0.01 

0.13 


ai 

0.04 

0.29 

0.01 

0.28 

0.00 

0.27 



-41.24 

85.51 

-12.30 

89.41 

-4.55 

93.08 


0(1 

-0.10 

3.81 

-0.01 

3.80 

0.01 

3.82 


a2 

2.25 

25.75 

1.13 

25.34 

1.61 

25.41 

30 

a3 

0.01 

0.14 

0.01 

0.14 

0.01 

0.14 


a4 

0.04 

0.29 

0.01 

0.27 

0.00 

0.26 



-27.15 

70.02 

-6.15 

72.64 

-1.78 

107.53 


0(1 

-0.10 

3.27 

-0.02 

3.26 

-0.01 

3.26 


a2 

1.82 

24.94 

0.75 

24.67 

1.18 

24.78 

40 

a3 

0.01 

0.12 

0.00 

0.12 

0.01 

0.12 


a4 

0.03 

0.26 

0.01 

0.25 

0.00 

0.25 



-20.38 

60.43 

-4.01 

62.21 

-1.82 

62.98 


ai 

-0.13 

2.86 

-0.05 

2.85 

-0.03 

2.85 


a2 

1.48 

18.86 

0.38 

18.59 

0.24 

18.46 

50 

a3 

0.01 

0.11 

0.00 

0.11 

0.00 

0.11 


a4 

0.02 

0.24 

0.00 

0.23 

0.00 

0.23 


CT^ 

-15.40 

53.99 

-1.94 

55.56 

-0.43 

56.11 
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estimated root mean squared errors for the Student t and power exponential distributions, for the errors-in-variables 
model. The results for the normal distribution are not shown to save space. We observe that, in absolute value, the 
biases of the bias-corrected estimates and bias-reduced estimates are smaller than those of the original MLE for different 
sample sizes. Furthermore, the bias-reduced estimates are less biased than the bias-corrected estimates in most cases. 
This can be seen e.g. in Table 6 when v = 1, m = 1, Yi follows a power exponential distribution and n = 15. In this 
case, the bias of the MLE, the bias-corrected estimate and the bias-reduced estimate of Eg are —4.92, —0.66 and —0.17, 
respectively. When Yi follows a Student t distribution, n = 15, v = 1 and m = 1 we observe the following biases of the 
estimates of 5.18 (MLE), 2.91 (bias-corrected) and 2.66 (bias-reduced); see Table 5. We note that the root mean 
squared errors decrease with n. 

For the sake of saving space, the simulation results for the normal. Student t and power exponential errors-in-variable 
models with v = 2 and m = 1 are not presented. Overall, our findings are similar to those reached for the other models. 


Table 5: Biases and VMSE of the maximum likelihood estimate and its adjusted versions; errors-in-variables model; 
r; = 1 and m = 1; Student t distribution. 




MLE 

Bias-corrected MLE 

Bias-reduced MLE 

71 

9 

Bias 

y'MSE 

Bias 

y'MSE 

Bias 

y'MSE 


/3o 

-0.00 

9.90 

0.01 

9.90 

0.01 

9.89 


/3i 

0.00 

0.14 

0.00 

0.14 

0.00 

0.14 

15 

M£C2 

0.05 

4.82 

0.05 

4.82 

0.05 

4.82 



5.18 

129.34 

2.91 

128.12 

2.66 

127.86 



-3.64 

19.52 

-0.68 

20.72 

-0.42 

20.85 


/3o 

-0.02 

7.14 

-0.01 

7.14 

-0.01 

7.14 


Pi 

0.00 

0.10 

0.00 

0.10 

0.00 

0.10 

25 

1^X2 

0.03 

3.69 

0.03 

3.69 

0.03 

3.69 


Yx2 

3.61 

97.32 

2.25 

96.76 

2.17 

96.69 



-2.31 

14.87 

-0.47 

15.41 

-0.38 

15.44 


Po 

-0.02 

5.93 

-0.02 

5.93 

-0.02 

5.93 


Pi 

0.00 

0.08 

0.00 

0.08 

0.00 

0.08 

35 

9'X2 

-0.01 

3.12 

-0.01 

3.12 

-0.01 

3.12 


'^X2 

1.94 

79.78 

0.98 

79.45 

0.94 

79.44 



-1.65 

12.63 

-0.31 

12.96 

-0.26 

12.97 


Pn 

-0.01 

4.92 

-0.01 

4.92 

-0.01 

4.92 


Pi 

0.00 

0.07 

0.00 

0.07 

0.00 

0.07 

50 

9'X2 

0.01 

2.59 

0.01 

2.59 

0.01 

2.59 


Yx2 

1.04 

65.50 

0.37 

65.33 

0.36 

65.33 


Y, 

-1.18 

10.53 

-0.24 

10.78 

-0.21 

10.79 
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Table 6: Biases and y/MSE of the maximum likelihood estimate and its adjusted versions; errors-in-variables model; 
u = 1 and m = 1; power exponential distribution. 




MLE 

Bias-corrected MLE 

Bias-reduced MLE 

71 

e 

Bias 


Bias 


Bias 



/3o 

-0.12 

9.25 

-0.11 

9.25 

-0.11 

9.24 


Pi 

0.00 

0.13 

0.00 

0.13 

0.00 

0.13 

15 


-0.02 

6.47 

-0.02 

6.47 

-0.02 

6.47 



-9.27 

103.32 

0.52 

107.51 

0.82 

107.64 



-4.92 

15.67 

-0.66 

17.55 

-0.17 

17.76 


Po 

0.02 

6.83 

0.03 

6.83 

0.03 

6.83 


Pi 

0.00 

0.09 

-0.00 

0.09 

-0.00 

0.09 

25 

t^X2 

-0.02 

4.98 

-0.02 

4.98 

-0.02 

4.98 


'^X2 

-5.60 

80.20 

0.36 

81.95 

0.47 

81.99 



-3.04 

12.94 

-0.36 

13.49 

-0.18 

13.54 


Po 

0.01 

5.59 

0.02 

5.58 

0.02 

5.58 


Pi 

-0.00 

0.08 

-0.00 

0.08 

-0.00 

0.08 

35 

t^X2 

-0.04 

4.21 

-0.04 

4.21 

-0.04 

4.21 


'Ex2 

-3.53 

68.01 

0.77 

69.10 

0.82 

69.12 


s. 

-2.14 

11.11 

-0.18 

11.46 

-0.08 

11.49 


Po 

0.03 

4.67 

0.03 

4.67 

0.03 

4.67 


Pi 

-0.00 

0.06 

-0.00 

0.06 

-0.00 

0.06 

50 

iJ'X2 

-0.03 

3.52 

-0.03 

3.52 

-0.03 

3.52 


'Ex2 

-2.83 

56.89 

0.18 

57.51 

0.21 

57.52 



-1.51 

9.21 

-0.12 

9.41 

-0.07 

9.42 
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6 Applications 


6.1 Radioimmunoassay data 

Tiede and Pagano (1979) present a dataset, referred here as the radioimmunoassay data, obtained from the Nuclear 
Medicine Department at the Veterans Administration Hospital, Buffalo, New York. Lemonte and Patriota (2011) ana¬ 
lyzed the data to illustrate the applicability of the elliptical models with general parameterization. Following Tiede and 
Pagano (1979) we shall consider the nonlinear regression model (O, with n = 14. The response variable is the observed 
radioactivity (count in thousands), the covariate corresponds to the thyrotropin dose (measured in micro-international 
units per milliliter) and the errors follow a normal distribution or a Student t distribution with ly = 4 degrees of free¬ 
dom. We assume that the scale parameter is unknown for both models. In Table 7 we present the maximum likelihood 
estimates, the bias-corrected estimates, the bias-reduced estimates, and the corresponding estimated standard errors are 
given in parentheses. We note that all the estimates present smaller standard errors under the Student t model than under 
the normal model (Table 7). 

For all parameters, the original MLEs are very close to the bias-corrected MLE and the bias-reduced MLE when the 
Student t model is used. However, under the normal model, significant differences in the estimates of ai are noted. The 
estimates for ai are 0.44 (MLE), 0.65 (bias-corrected MLE) and 1.03 (bias-reduced MLE). 


Table 7: Estimates and standard errors (given in parentheses); radioimmunoassay data. 


Normal distribution 

e 

MLE 

Bias-corrected MLE 

Bias-reduced MLE 

ai 

0.44 (0.80) 

0.65 (0.99) 

1.03 (1.06) 

Q!2 

7.55 (0.95) 

7.34 (1.16) 

6.91 (1.25) 

as 

0.13 (0.06) 

0.13 (0.06) 

0.13 (0.08) 

a4 

0.96 (0.24) 

0.93 (0.28) 

0.95 (0.34) 


0.31 (0.12) 

0.40 (0.15) 

0.50 (0.19) 

Student t distribution 

e 

MLE 

Bias-corrected MLE 

Bias-reduced MLE 

ai 

0.90 (0.12) 

0.91 (0.13) 

0.90 (0.15) 

02 

7.09 (0.17) 

7.08 (0.19) 

7.07 (0.22) 

as 

0.09 (0.01) 

0.09 (0.01) 

0.09 (0.02) 

04 

1.31 (0.08) 

1.31 (0.09) 

1.29 (0.10) 


0.02 (0.01) 

0.02 (0.01) 

0.03 (0.01) 


6.2 Fluorescent lamp data 

Rosillo and Chivelet (2009) present a dataset referred here as the fluorescent lamp data. The authors analyze the lifetime 
of fluorescent lamps in photovoltaic systems using an analytical model whose goal is to assist in improving ballast design 
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and extending the lifetime of fluorescent lamps. Following Rosillo and Chivelet (2009) we shall consider the nonlinear 
regression model (|9l) with qi = 1, Zi = 0, = a^, 9 = (a^, tr^) = (ao, ai, 02 , as, tr^) and 


1 I I I I 2 ’ ^ ■ ' ■ ’ 

1 + ao + ctiXii + a2Xi2 + 013 x 12 

where the response variable is the observed lifetime/advertised lifetime {Y), the covariates correspond to a measure of 
gas discharge (a;i) and the observed voltage/ad- vertised voltage (measure of performance of lamp and ballast - X 2 ) and 
the errors are assumed to follow a normal distribution. Here we also assume a Student t distribution with i/ = A degrees 
of freedom for the errors. 

In Table 8 we present the maximum likelihood estimates, the bias-corrected estimates, the bias-reduced estimates, 
and the corresponding estimated standard errors. As in the previous application, the estimates present smaller standard 
errors under the Student t model than under the normal model. 

The original MLEs for ao and as are bigger than the corresponding corrected and reduced versions by approximately 
one unit (normal and Student t models). The largest differences are among the estimates of a 2 ; for example, for the 
normal model we have —56.33 (MLE), —54.45 (bias-corrected MLE) and —53.86 (bias-reduced MLE). 

We now use the Akaike Information Criterion {AIC, Akaike, 1974), the Schwarz Bayesian criterion (BIG, Schwarz, 
1978) and the finite sample AIC (AICc, Hurvich and Tsai, 1989) to evaluate the quality of the normal and Student t 
fits. Eor the normal model we have AIC = —9.98, BIC = —6.79 and AICc = —2.48. Eor the t model we have 
AIC = —11.24, BIC = —8.04 and AICc = —3.74. Therefore, the t model presents the best fit for this dataset, since 
the values of the AIC, BIC and AICc are smaller. 

Let 

14 

1=1 

where Y and are the vectors of predicted values computed from the model fit for the whole sample and the sample 
without the jth observation, respectively. The quantity D measures the total effect of deleting one observation in the 
predicted values. Eor a fixed sample size, it tends to be high if a single observation can highly influence the prediction 
of new observations. We have D = 0.119,0.120, and 0.123 (normal model) and D = 0.101, 0.100, and 0.095 (Student 
t model) when using the MLE, the bias-corrected estimate, and the bias-reduced estimate, respectively. Notice that D 
is smaller for the Student t model regardless of the estimate used. This is evidence that the Student t model is more 
suitable than the normal model for predicting lifetime of fluorescent lamps in this study. 


6.3 WHO MONICA data 

We now turn to a dataset from the WHO MONICA Project that was considered in Kulathinal et al. (2002). This dataset 
was first analyzed under normal distributions for the marginals of the random errors (Kulathinal et al. 2002; Patriota 
et al. 2009a). Thereafter, it was studied under a scale mixture of normal distributions for the marginals of the random 
errors (Cao et al., 2012). The approach used in the present paper is different from the others because here we consider a 
joint elliptical distribution for the vector of random errors. The other authors assumed that the distributions of the errors 
were independent, while we assume that they are uncorrelated but not independent. Eor our proposal, the errors will 
only be independent under normality. 
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Table 8 : Estimates and standard errors (given in parentheses); fluorescent lamp data. 


Normal distribution 

e 


MLE 

Bias-corrected MLE 

Bias-reduced MLE 

ao 

29.49 

(5.21) 

28.54 

(5.66) 

28.25 

(5.84) 

ai 

9.99 

(4.69) 

9.68 

(5.21) 

9.62 

(5.42) 

0(2 

-56.33 

(10.10) 

-54.45 

(10.93) 

-53.86 

(11.26) 

as 

26.53 

(4.89) 

25.61 

(5.28) 

25.31 

(5.43) 


1.40 X 10- 

-2 (5.00 X 10-3) 

1.80 X 10-2 

(7.00 X 10-3) 

1.90 X 10-2 

(7.00 X 10-3) 

Student t distribution 

e 


MLE 

Bias-corrected MLE 

Bias-reduced MLE 

ao 

30.66 

(4.64) 

29.94 

(5.05) 

29.85 

(5.20) 

at 

8.48 

(4.00) 

8.24 

(4.42) 

8.46 

(4.57) 

a2 

-58.20 

(8.94) 

-56.79 

(9.71) 

-56.67 

(10.00) 

as 

27.27 

(4.30) 

26.58 

(4.66) 

26.55 

(4.80) 


7.30 X 10- 

-3 (3.60 X 10-3) 

9.20 X 10-3 

(4.60 X 10-3) 

9.80 X 10-3 

(4.90 X 10-3) 


The dataset considered here corresponds to the data collected for men (n = 38). As describe in Kulathinal et al. 
(2002), the data are trends of the annual change in the event rate {y) and trends of the risk scores (x). The risk score is 
defined as a linear combination of smoking status, systolic blood pressure, body mass index, and total cholesterol level. 
A follow-up study using proportional hazards models was employed to derive its coefficients, and provides the observed 
risk score and its estimated variance. Therefore, the observed response variable, Xi, is the average annual change in 
event rate (%) and the observed covariate, X 2 , is the observed risk score (%). We use the heteroscedastic model (fTOl l 
with V = m = 1 and zero covariance between the errors and Sx 2 i- 

Table 9 gives the MLE and the bias-corrected/reduced estimates (standard errors are given in parentheses). We 
considered the full sample (n = 38) and randomly chosen sub-samples of n = 10, 20 and 30 observations. 

The original MLEs for /3o, Pi and fj,x 2 ^6 practically the same as their bias-corrected and bias-reduced versions for 
all sample sizes. The largest differences are among the estimates of S^; for example, for n = 10 we have 6.17 (MLE), 
8.14 (bias-corrected MLE) and 8.81 (bias-reduced MLE). In general, as expected, larger sample sizes correspond to 
smaller standard errors. 


7 Concluding remarks 

We studied bias correction and bias reduction for a multivariate elliptical model with a general parameterization 
that unifies several important models (e.g., linear and nonlinear regressions models, linear and nonlinear mixed models, 
errors-in-variables models, among many others). We extend the work of Patriota and Lemonte (2009) to the elliptical 
class of distributions defined in Lemonte and Patriota (2011). We express the second order bias vector of the maximum 
likelihood estimates as an weighted least-squares regression. 
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Table 9: Estimates and standard errors (given in parentheses); WHO MONICA data. 


n 

0 

MLE 

Bias-corrected MLE 

Bias-reduced MLE 


/3o 

-2.58 (1.34) 

-2.58 (1.44) 

-2.45 (1.47) 


Pi 

0.05 (0.60) 

0.05 (0.63) 

0.07 (0.64) 

10 

1 ^X2 

-1.54 (0.58) 

-1.54 (0.61) 

-1.53 (0.62) 


^X2 

2.89 (1.50) 

3.22 (1.65) 

3.29 (1.69) 



6.17 (3.99) 

8.14 (4.93) 

8.81 (5.25) 


Po 

-2.68 (0.65) 

-2.69 (0.68) 

-2.69 (0.69) 


Pi 

0.48 (0.30) 

0.47 (0.31) 

0.43 (0.31) 

20 

1 ^X2 

-1.29 (0.44) 

-1.29 (0.46) 

-1.29 (0.46) 


^X2 

3.53 (1.25) 

3.73 (1.31) 

3.76 (1.32) 



3.00 (1.66) 

3.59 (1.87) 

3.73 (1.92) 


Po 

-2.22 (0.54) 

-2.22 (0.55) 

-2.20 (0.55) 


pi 

0.43 (0.24) 

0.43 (0.25) 

0.42 (0.25) 

30 

P'X2 

-0.77 (0.42) 

-0.77 (0.42) 

-0.77 (0.42) 


^X2 

4.71 (1.34) 

4.88 (1.39) 

4.89 (1.39) 



4.36 (1.86) 

4.89 (2.01) 

4.88 (2.01) 


Po 

-2.08 (0.53) 

-2.08 (0.54) 

-2.08 (0.54) 


pi 

0.47 (0.23) 

0.47 (0.24) 

0.46 (0.24) 

38 

1 ^X2 

-1.09 (0.36) 

-1.09 (0.36) 

-1.09 (0.36) 


^X2 

4.32 (1.10) 

4.44 (1.13) 

4.45 (1.13) 



4.89 (1.78) 

5.34 (1.89) 

5.30 (1.88) 
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As can be seen in our simulation results, corrected-bias estimators and reduced-bias estimators form a basis of 
asymptotic inferential procedures that have better performance than the corresponding procedures based on the original 
estimator. We further note that, in general, the bias-reduced estimates are less biased than the bias-corrected estimates. 
Computer packages that perform simple operations on matrices and vectors can be used to compute bias-corrected and 
bias-reduced estimates. 


Appendix 

Lemma A.l. Let Zi ^ Elq^ (0, Si, g), and Ci and V'i( 2 .i) as previously defined. Then, 

E{viZi) = 0 , 

E{vjzizJ) = 

Qi 

E{viWec{zizJ)zJ) = 0 , 

E{vi\ec{zizJ)vec{zizJ)~'^) = Ci(vec(Si)vec(Ei)^ -f 2Ei 0 E^), 

E{vfvec{zizJ)vec{zizJ)^) = -c* (vec(Ei)vec(Ei)^ -f 2Ei 0 E^), 

SWj ^tT^ Ej j-tr^ Ej j-tr^ Ej} 

+ 2tr{Aj(t)Ei}tr{Aj(s)EiAj(r)Si} 

+ 2tr{Aj(s)Ei}tr{Aj(t)EiAj(r)Si} 

+ 2tr{Aj(^)Ei}tr{Aj(j)Ei Aj(s)Ei}) 

+ 8tr{Aj(j)EiAj(s)EiAj(r)Ei}), 

where c* = 8 - 01 ( 3 . 2 )/{*(ft + 2)}, 0 i( 3 . 2 ) = E(W^{r,)rf), Ui = 0 i( 3 . 3 )/{ft(ft + 2)(ft + 4)} and ^^( 3 ^ 3 ) = 
E{W^{n)r^). 


Proof: The proof can be obtained by adapting the results of Mitchell (1989) for a matrix version. 


From Lemma A.l, we can find the cumulants of the log-likelihood derivatives required to compute the second-order 
biases. 


Proof of Theorem 3.1: Following Cordeiro and Klein (1994), we write ([T]i in matrix notation to obtain the second-order 
bias vector of 9 in the form 

B^{e) = K{0)-^Wvec{K{e)-^), (15) 

where W = {W^^\ ..., is a p x partitioned matrix, each referring to the rth component of 6, being a 

p X p matrix with typical {t, s)th element given by 


(r) 


- 


(r) 1 8 ( 7 .) 1 , / f'l 

+ ^ts.r = ^ts ~ n^tsT = T^ts ~ 7(^4,s.r + l^sr 
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Because K{0) is symmetric and the fth element of Wvec{K{9) is + • • • + (^ 4 *^ + 

+ • • • + + wlp^K^’P, we may write 


= -(t 

O 




,,(f(s) _ (t) _ N 

hs ) — A \^ts ^ ^tr ^sr '^t,s,r)- 


(16) 


Comparing (fTSl l and (| 8 ]l we note that for the proof of this theorem it suffices to show that = Wvec{{F^ HF) ^), 

i.e.. 


Notice that 


~ I n^^{^iir)Ci(s)} 


4r(’i(2,r) 


q^ 




(Ci - 1) 


tr{Ai(s)S,}tr{Aj(r)S,} 


(17) 


The quantities V'i( 2 ,i) V'i( 2 , 2 ) do not depend on 0 and hence, the derivative of (fTTl i with respect to 9t is 

" fCi 
i=l 1 

'1 ^ f 

F Ci(^tr)^i{s)} r ~ ^ ' r ^i(r) F 

J 4^1 L 

+ ®i(s)^i +^1'^ -^C'i(ts)}h'{^j(r)Si} 


E 

i=l 


- Z - '^-^{Ai{t)Ci(r) + ^C'4(tr)}h'{^i(s)Si} 


Therefore, 


^st ^tr ^ir ^ ^ I 2 ^^'{^z(r) ^z^2(s) “f ^2(5) 

i=l 1 

+ 2Ci(rs)^i(t)}'i ~ E/i 7, ^i(sr) 

J i=r 1 * 

d” ^i{t)^i{s)^i(r) T ^i(s)-^i(r)^i(t) ~ ^i{s) ^i(t)^i(r)'} 

ic^ - 1 ). 


+ E 


■tr{^i(r)C'i(s) + Ci(rs)}tr{^i(t)Si} 
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Now, the only quantity that remains to obtain is Kt^s,r = E{UtUsUr). Noting that Zi is independent of Zj for i ^ j, 
we have 

i^t,s,r — g 'y ^ eI [tr{Aj ((^(Eli ViZiZ^ )}tr{Aj(5^(Sj ViZiZ^ )} 
i —1 ^ 

tr{^i(r)(Si - ViZizJ)}] +4tr{Ai(i)(Ei - ViZizJ)}(v1al^^-^E-^ 

ZizjE~^a^i^s)) +4tr{Ai(r)(Ei - ViZizJ)}{vial,^^-^E~^ZizJE~^ 

®i(s)) 4“ 4tr{Ajj-g)(Si — UiZiZj ZiZi S 0 ^( 7 .)) 

Then, by using Lemma A. 1 and from (fThl l. we have, after lengthy algebra, that 

n 

(18) 

i=l 

where 


and 


with 


4’4(r) ~ 2 




Ei(^r) EiFi 


m 

d9r 


2 \2Tj2iEi ^i[r) 2 (Ci \^S\i^^^J 


1 f piiSitlS^ } 2pii(Ji^^jVec(Si) \ 

4 277iiVec(Si)a^)'(^) 2(ci + 8uJi)S2^{r) J ’ 


mi = C* + 4-!/'i(2,l)/<7i, V2i = C* - ^i(2,l)lqi, 

Sii(r) = vec(Si)vec(C'i(r))^ + ivec(Sj)vec(Ei)^tr{C'i(r)S“^} and 
<S'2*(r) = vec(C'j(r))vec(Ei)"^ + vec(Ei)vec(C'j(r))"^ + 4Ei (g) C'j(r) 

+ [Si g) Sj + ivec(Sj)vec(Sj)^]tr{C'i(r)E“^}. 

Using (fTSl) and (fTSl) the theorem is proved. 

Proof of Corollary 3.1: It follows from Theorem 3.1, eq. (I 8 ]l, when 

F = block-diag{Tej,FeJ, H = block-diag{ffi,i/ 2 } and ^ = {£j )^, 
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T 


* ■ * ’ ^j{n) 


where Fq. = 

Pe^{.i) = d[vQc{Y.i)]/dej, # 1 ( 4 ) = 


and Hj = block-diag{iJj(i),..., for j 


4V’i(2,i) ip-l 


and H 2 (i) = Ci ( 21 ]^ (g) S*) ^ + {a - 


more, = 




l(n) 


and ^2 = 




■ • ■ ’ ?2(n) 


iT 


with 


= 1 , 2 , with = dfiildej, 

l)vec(E“^)vec(I]“^)^. Further- 


Ci(*) = - vec((J^g^iF(i)FeJ ^), 

= \m:p: vec((F,^iF(i)F,J-i) + i (m^Q* - 4 
vec((FTiJ( 2 )F,J-i). 


Also, i^f 


^l(*) 

p: = {p:i 




( 1 )’ 


i(i)> • ■ • ’'^^ 2(0 ~ ■ ■ ■ ’ — [Qi 

Cr(p,)i^ - i - I- i(i)> • ■ •, P*ip,)l w = ’ Pk(^ = ’ '^here 0i(,) and 02 (s) are the rth and sth 

elements of 9i and 6 * 2 , respectively, r = 1 ,... ,pi, s = 1 ,... ,p 2 and 


M* = -(t vec(S,)vec(I],)T \ 

* Ci V 2ci + vec(Sj)T^vec(Si)/ ’ 

Qi{s) = ~ l)'S'ii(s) + -^{ci + 8wi)S'2j(s)^ (Sj ^ (g) Sj -Fe2(i), 

-fi(r) ~ (^V2iilqi ® O-i(r)) + t7liVec(I]i)a^(^^^ Sj ^ -Fgj(i). 
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